# Audio Understanding
Videollama2.1 7B AV CoT
Apache-2.0
VideoLLaMA2.1-7B-AV is a multimodal large language model focused on audio-visual question answering tasks, capable of processing both video and audio inputs to provide high-quality question answering and description generation.
Video-to-Text
Transformers English

V
lym0302
34
0
Qwen2 Audio 7B Instruct 4bit
This is the 4-bit quantized version of Qwen2-Audio-7B-Instruct, developed based on Alibaba Cloud's original Qwen model. It is an audio-text multimodal large language model.
Audio-to-Text
Transformers

Q
alicekyting
1,090
6
Featured Recommended AI Models